[benchmarks] Reworked the conversion benchmark and added more tests for up/down casts #4800

AndreyPavlenko · 2025-07-29T12:53:33Z

Here is an example of the benchmark report:

float-upcast:
            N  fp8e4nv->fp16  fp8e4nv->bf16  fp8e4nv->fp32  fp8e5->fp16  fp8e5->bf16  fp8e5->fp32  fp16->fp32  bf16->fp32
0      4096.0       0.010524       0.010252       0.010187     0.080630     0.010488     0.043948    0.139510    0.193939
1     16384.0       0.042079       0.041001       0.040740     0.322013     0.041959     0.175794    0.556522    0.771375
2     65536.0       0.168248       0.163971       0.162928     1.286028     0.167731     0.701971    2.223066    3.079699
3    262144.0       0.507677       0.501730       0.493866     5.088199     0.503581     2.786395    8.820458   12.158813
4   1048576.0       1.202495       1.163017       1.182960    12.700775     1.169868     8.868200   30.411137   26.532794
5   4194304.0       1.623786       1.587861       1.594501    21.531335     1.587212    15.298745   50.582537   42.538580
6  16777216.0       1.961367       1.927211       1.924611    33.078107     1.923428    23.340590   83.022644   69.788752
7  67108864.0       2.121850       2.079672       2.081887    35.847220     2.076763    25.215246   95.001223   78.473379
float-downcast:
            N  fp16->fp8e4nv/rtne  fp16->fp8e5/rtne  bf16->fp8e4nv/rtne  bf16->fp8e5/rtne  fp32->fp8e4nv/rtne  fp32->fp8e4nv/rtz  fp32->fp8e5/rtne  fp32->fp8e5/rtz  fp32->fp16/rtne  fp32->fp16/rtz  fp32->bf16/rtne  fp32->bf16/rtz
0      4096.0            0.023953          0.071309            0.020427          0.020877            0.028330           0.202172          0.051561         0.304762         0.436674        0.508189         0.432981        0.135899
1     16384.0            0.095768          0.285237            0.081626          0.083490            0.113290           0.807094          0.206036         1.213630         1.731924        2.017734         1.724632        0.542517
2     65536.0            0.382893          1.138173            0.326310          0.333618            0.452534           3.222026          0.823523         4.825920         6.898526        8.031373         6.855230        2.167196
3    262144.0            1.490979          3.903276            1.226577          1.254518            1.786452          11.270163          3.193762        17.558205        26.969547       29.721542        26.586613        8.594885
4   1048576.0            4.111418          5.797081            2.281100          2.298904            6.448014          29.355431         10.123344        53.335504        60.401843       72.067079        57.174264       18.035363
5   4194304.0            7.612720         12.685410            4.564980          4.603662           12.267993          45.939803         18.057103        85.667974       103.768036      114.161786        97.000555       28.142136
6  16777216.0           10.127255         14.296733            5.533675          5.576271           18.747797          55.520604         26.028944       111.877941       179.781569      162.617195       168.074694       46.045713
7  67108864.0           10.861711         15.523535            5.929267          5.961534           18.644044          57.701748         27.834568       117.730718       204.201753      181.876698       191.695795       49.386877

And the benchmark run results - https://github.com/intel/intel-xpu-backend-for-triton/actions/runs/16596562534/job/46944550498

LiyangLingIntel · 2025-07-30T01:49:35Z

benchmarks/micro_benchmarks/conversion/float_conversion/float_conversion.py

+        y = tl.cast(y, y_type, bitcast=True)
+        tl.store(y_ptr + offsets, y, mask=mask)
+
+    kernel.__name__ = kernel.__qualname__ = name


Can you explain why we want to change the kernel name here?

It makes much easier finding and comparing the kernel IRs in the cache. For example, I want to compare the rtne and rtz IRs:

$ find ~/.triton/cache/ -name '*fp32_to_fp16*.llir' /home/jovyan/.triton/cache/PXX2VZY5SPACCMUKHEIEDQEEB2CN2A562JHDMIVPLHTS6F6LSRAA/rtne_fp32_to_fp16_conversion_kernel.llir /home/jovyan/.triton/cache/CYCEIIP4OQCRWTMZHZHLPXO63ZR5J7ZEPN4KCM72NVOITFO5FMRA/rtz_fp32_to_fp16_conversion_kernel.llir

I think we can keep this in develop stage, since from the production perspective, there is not performance or visualization improvement from this change. But it breaks the code structure to be incosistant with other benchmarks, which reduces code readability.
Without this change, we also do not need lru_cache, _make_kernel and could keep using triton.jit as decorator.

It's very convenient using benchmarks for development and testing. This bench produces 20 kernels and it's not easy to recognize the required kernel in the cache.

LiyangLingIntel · 2025-07-30T01:52:57Z

benchmarks/micro_benchmarks/conversion/float_conversion/float_conversion.py

+        tl.store(y_ptr + offsets, y, mask=mask)
+
+    kernel.__name__ = kernel.__qualname__ = name
+    return triton.jit(kernel)


I would suggest using triton.jit in decorator as other benchmarks do.

The decorator does not allow changing the name.

LiyangLingIntel · 2025-07-30T01:56:20Z

benchmarks/micro_benchmarks/conversion/float_conversion/float_conversion.py

+             if r == 'rtne' or s == tl.float32]
+
+
+@lru_cache


Triton has its cache mechanism. You can check the kernel jit cache under TRITON_CACHE_DIR if specified. We do not need to use lru_cache.

Right, but JITFunction object is created on each call to _make_kernel(). I'm not sure if it has significant impact on the bench results.

LiyangLingIntel · 2025-07-30T02:03:23Z

benchmarks/micro_benchmarks/conversion/float_conversion/float_conversion.py

    return gbps(ms), gbps(max_ms), gbps(min_ms)


+def _report(plot_name, line_names, line_vals):
+    report = triton.testing.perf_report(
+        triton.testing.Benchmark(


It's recommended to use triton_kernels_benchmark instead of triton.testing.Benchmark to get the accurate kernel time cost.
It's better to use perf_report and benchmark in python decorator form to keep consistent with other benchmarks.
If you want to add more arguments to the benchmarks, you can refer to

intel-xpu-backend-for-triton/benchmarks/triton_kernels_benchmark/gemm_benchmark.py

Lines 298 to 342 in 4f35147

def get_benchmark(

providers_filter: Optional[list[str]] = None,

transpose_a=False,

transpose_b=False,

matmul_kernel=matmul_kernel_with_block_pointers,

matmul_kernel_batched=matmul_kernel_with_block_pointers_batched,

plot_name='matmul-performance',

):

"""

Returns a Mark object containing a Benchmark object constructed at runtime and parameterized by the provided option values.

The benchmark can then be executed by calling the :code:`.run` method on the return value.

"""

supported_providers = {

'triton': 'Triton',

'onednn': 'OneDNN',

}

# use_cutlass

if not (transpose_a or transpose_b):

if torch.xpu.get_device_name() != 'Intel(R) Arc(TM) Graphics':

# FIXME: enable cutlass on LNL

supported_providers['cutlass'] = 'CUTLASS'

providers = benchmark_suite.filter_providers(supported_providers, providers_filter)

# Benchmark Performance

# pylint: disable=too-many-branches

@benchmark_suite.perf_report(

benchmark_suite.Benchmark(

# argument names to use as an x-axis for the plot

x_names=['B', 'M', 'N', 'K'],

# different possible values for `x_name`

x_vals=X_VALS,

line_arg='provider',

# argument name whose value corresponds to a different line in the plot

# possible values for `line_arg``

line_vals=list(providers.keys()),

# label name for the lines

line_names=list(providers.values()),

# line styles

styles=[('green', '-'), ('green', '--'), ('blue', '-'), ('blue', '--')],

ylabel=['GB/s', 'TFlops'], # label name for the y-axis

plot_name=plot_name,

# name for the plot. Used also as a file name for saving the plot.

args={},

))

def benchmark(B, M, N, K, provider):

Switched to triton_kernels_benchmark, but now there are much more results. I've grouped them in 8 benchmarks, one for each pair of types:

fp8e4nv-fp16: N fp8e4nv->fp16-GB/s fp16->fp8e4nv-rtne-GB/s fp8e4nv->fp16-GB/s-min fp16->fp8e4nv-rtne-GB/s-min fp8e4nv->fp16-GB/s-max fp16->fp8e4nv-rtne-GB/s-max fp8e4nv->fp16-CV fp16->fp8e4nv-rtne-CV 0 4096.0 0.009099 0.024323 0.007853 0.024311 0.010699 0.024335 0.119163 0.000359 1 16384.0 0.042830 0.097263 0.042824 0.097246 0.042845 0.097292 0.000268 0.000246 2 65536.0 0.171211 0.388892 0.171152 0.388799 0.171237 0.388984 0.000311 0.000254 3 262144.0 0.514351 1.515020 0.514218 1.512835 0.514573 1.517318 0.000333 0.001386 4 1048576.0 1.211834 4.166472 1.206658 4.166042 1.215926 4.167631 0.005078 0.000204 5 4194304.0 1.627807 7.824465 1.627375 7.666599 1.628487 7.990247 0.000360 0.021691 6 16777216.0 1.961954 10.152995 1.960651 10.151963 1.963331 10.153733 0.000892 0.000100 7 67108864.0 2.121951 10.910362 2.121518 10.909286 2.122361 10.911670 0.000233 0.000094 fp8e4nv-bf16: N fp8e4nv->bf16-GB/s bf16->fp8e4nv-rtne-GB/s fp8e4nv->bf16-GB/s-min bf16->fp8e4nv-rtne-GB/s-min fp8e4nv->bf16-GB/s-max bf16->fp8e4nv-rtne-GB/s-max fp8e4nv->bf16-CV bf16->fp8e4nv-rtne-CV 0 4096.0 0.010431 0.020689 0.010428 0.020679 0.010432 0.020704 0.000189 0.000471 1 16384.0 0.041715 0.082722 0.041694 0.082707 0.041728 0.082747 0.000361 0.000286 2 65536.0 0.166801 0.330556 0.166707 0.330429 0.166843 0.330723 0.000373 0.000418 3 262144.0 0.508169 1.243508 0.507339 1.241306 0.508504 1.246121 0.001142 0.001853 4 1048576.0 1.170913 2.298198 1.170286 2.297332 1.171415 2.299105 0.000427 0.000372 5 4194304.0 1.590390 4.414360 1.588078 4.253789 1.592187 4.589115 0.001110 0.040270 6 16777216.0 1.927215 5.541919 1.925484 5.541688 1.928841 5.542215 0.000827 0.000049 7 67108864.0 2.078866 5.955846 2.078395 5.955534 2.079470 5.956143 0.000249 0.000041 fp8e4nv-fp32: N fp8e4nv->fp32-GB/s fp32->fp8e4nv-rtne-GB/s fp32->fp8e4nv-rtz-GB/s fp8e4nv->fp32-GB/s-min fp32->fp8e4nv-rtne-GB/s-min fp32->fp8e4nv-rtz-GB/s-min fp8e4nv->fp32-GB/s-max fp32->fp8e4nv-rtne-GB/s-max fp32->fp8e4nv-rtz-GB/s-max fp8e4nv->fp32-CV fp32->fp8e4nv-rtne-CV fp32->fp8e4nv-rtz-CV 0 4096.0 0.010362 0.028596 0.220511 0.010355 0.028587 0.220215 0.010369 0.028603 0.220785 0.000572 0.000277 0.001602 1 16384.0 0.041439 0.114354 0.879678 0.041421 0.114350 0.878593 0.041457 0.114356 0.880860 0.000401 0.000099 0.001598 2 65536.0 0.165713 0.456808 3.507412 0.165682 0.456735 3.506849 0.165763 0.456913 3.508351 0.000286 0.000255 0.000757 3 262144.0 0.500332 1.804281 12.206938 0.499604 1.803710 12.181413 0.500887 1.805399 12.231430 0.001165 0.000460 0.001846 4 1048576.0 1.193028 6.527287 31.324152 1.192344 6.523916 31.237370 1.194213 6.531394 31.402012 0.001297 0.000556 0.002341 5 4194304.0 1.599378 12.358361 46.772281 1.598380 12.115543 46.425925 1.600313 12.566826 47.254437 0.000648 0.014158 0.025110 6 16777216.0 1.926104 18.816976 55.924057 1.925887 18.813263 55.880840 1.926340 18.820691 55.949418 0.000105 0.000181 0.000633 7 67108864.0 2.082396 18.749761 57.819820 2.082237 18.684158 57.809411 2.082518 18.789707 57.828937 0.000069 0.002619 0.000188 fp8e5-fp16: N fp8e5->fp16-GB/s fp16->fp8e5-rtne-GB/s fp8e5->fp16-GB/s-min fp16->fp8e5-rtne-GB/s-min fp8e5->fp16-GB/s-max fp16->fp8e5-rtne-GB/s-max fp8e5->fp16-CV fp16->fp8e5-rtne-CV 0 4096.0 0.090101 0.073537 0.089762 0.073458 0.090459 0.073584 0.003488 0.001016 1 16384.0 0.360405 0.293673 0.359298 0.293326 0.360817 0.293915 0.002256 0.001314 2 65536.0 1.435304 1.172799 1.434174 1.171625 1.437193 1.173639 0.003717 0.001082 3 262144.0 5.674113 4.018765 5.645762 4.015686 5.688889 4.020613 0.004035 0.000635 4 1048576.0 12.052598 5.826393 12.042631 5.817665 12.069245 5.833200 0.001300 0.001061 5 4194304.0 20.661597 11.960147 19.493159 11.722220 21.315986 12.711861 0.058014 0.036636 6 16777216.0 29.556085 14.154166 29.527371 14.151587 29.590698 14.155981 0.000950 0.000153 7 67108864.0 31.602352 15.201539 31.593605 15.141837 31.612656 15.238494 0.000372 0.003269 fp8e5-bf16: N fp8e5->bf16-GB/s bf16->fp8e5-rtne-GB/s fp8e5->bf16-GB/s-min bf16->fp8e5-rtne-GB/s-min fp8e5->bf16-GB/s-max bf16->fp8e5-rtne-GB/s-max fp8e5->bf16-CV bf16->fp8e5-rtne-CV 0 4096.0 0.010617 0.020971 0.010614 0.020965 0.010618 0.020975 0.000192 0.000310 1 16384.0 0.042465 0.083883 0.042451 0.083859 0.042476 0.083900 0.000380 0.000310 2 65536.0 0.169765 0.335034 0.169733 0.334914 0.169818 0.335188 0.000293 0.000341 3 262144.0 0.508051 1.238690 0.507874 1.218866 0.508189 1.260308 0.000259 0.016453 4 1048576.0 1.166588 2.303297 1.165706 2.303064 1.167035 2.303631 0.000671 0.000161 5 4194304.0 1.574403 4.553186 1.572210 4.525967 1.575782 4.597888 0.000999 0.027275 6 16777216.0 1.908402 5.553034 1.906225 5.552781 1.910218 5.553339 0.000954 0.000059 7 67108864.0 2.059672 5.925414 2.058876 5.922217 2.060320 5.930591 0.000308 0.002324 fp8e5-fp32: N fp8e5->fp32-GB/s fp32->fp8e5-rtne-GB/s fp32->fp8e5-rtz-GB/s fp8e5->fp32-GB/s-min fp32->fp8e5-rtne-GB/s-min fp32->fp8e5-rtz-GB/s-min fp8e5->fp32-GB/s-max fp32->fp8e5-rtne-GB/s-max fp32->fp8e5-rtz-GB/s-max fp8e5->fp32-CV fp32->fp8e5-rtne-CV fp32->fp8e5-rtz-CV 0 4096.0 0.045230 0.051425 0.346238 0.045214 0.051406 0.345479 0.045246 0.051437 0.347354 0.000944 0.000355 0.003939 1 16384.0 0.180759 0.205635 1.384368 0.180600 0.205622 1.383784 0.180919 0.205643 1.384719 0.000944 0.000178 0.001195 2 65536.0 0.720809 0.821871 5.509542 0.718344 0.821583 5.497986 0.722399 0.822077 5.520216 0.002619 0.000379 0.002502 3 262144.0 2.875647 3.222620 19.814360 2.874386 3.213494 19.775497 2.879438 3.232515 19.859394 0.000812 0.002902 0.003130 4 1048576.0 8.786459 10.241500 58.596032 8.780279 10.234403 58.383962 8.799141 10.248006 58.776682 0.001510 0.000658 0.002911 5 4194304.0 15.368254 18.575305 89.136205 14.555469 18.125147 88.234263 16.153809 19.050470 89.944760 0.040169 0.027046 0.017450 6 16777216.0 21.927561 25.998892 110.101176 21.904659 25.994424 109.902104 21.952351 26.003124 110.295157 0.001060 0.000165 0.001898 7 67108864.0 23.479745 27.945724 118.175422 23.471203 27.936418 118.097838 23.491448 27.958579 118.261008 0.000419 0.001028 0.000936 fp16-fp32: N fp16->fp32-GB/s fp32->fp16-rtne-GB/s fp32->fp16-rtz-GB/s fp16->fp32-GB/s-min fp32->fp16-rtne-GB/s-min fp32->fp16-rtz-GB/s-min fp16->fp32-GB/s-max fp32->fp16-rtne-GB/s-max fp32->fp16-rtz-GB/s-max fp16->fp32-CV fp32->fp16-rtne-CV fp32->fp16-rtz-CV 0 4096.0 0.151311 0.495884 0.640500 0.150943 0.492308 0.635236 0.151929 0.499512 0.644836 0.003135 0.011283 0.007796 1 16384.0 0.601910 1.978744 2.520615 0.600234 1.967339 2.497561 0.602707 1.988349 2.531520 0.002605 0.005164 0.006579 2 65536.0 2.398829 7.886402 10.105782 2.395322 7.876923 10.039216 2.402346 7.914976 10.176397 0.001565 0.002228 0.006945 3 262144.0 9.556836 30.840469 36.459528 9.547785 30.624299 36.368480 9.581287 30.942398 36.653244 0.001869 0.005031 0.004931 4 1048576.0 32.911989 66.788275 85.773093 32.572565 66.534007 84.836246 33.098991 67.216412 86.802651 0.007563 0.004307 0.009884 5 4194304.0 52.632751 111.535810 125.427760 51.940559 108.323965 124.356737 54.398006 116.095662 126.884793 0.020584 0.029834 0.009612 6 16777216.0 83.593503 183.247404 166.796403 83.332753 182.726495 165.182107 83.919648 183.590298 168.014093 0.002849 0.002184 0.008465 7 67108864.0 93.211930 206.396724 183.397635 93.128115 205.986829 183.197372 93.279308 206.789128 183.574224 0.000715 0.001967 0.001007 bf16-fp32: N bf16->fp32-GB/s fp32->bf16-rtne-GB/s fp32->bf16-rtz-GB/s bf16->fp32-GB/s-min fp32->bf16-rtne-GB/s-min fp32->bf16-rtz-GB/s-min bf16->fp32-GB/s-max fp32->bf16-rtne-GB/s-max fp32->bf16-rtz-GB/s-max bf16->fp32-CV fp32->bf16-rtne-CV fp32->bf16-rtz-CV 0 4096.0 0.216262 0.496485 0.140877 0.215852 0.494686 0.140659 0.217133 0.497570 0.141047 0.003744 0.003428 0.001148 1 16384.0 0.858250 1.979939 0.562637 0.856904 1.976834 0.561866 0.860504 1.988349 0.563411 0.002169 0.003098 0.001038 2 65536.0 3.427615 7.867467 2.246307 3.413333 7.839234 2.243769 3.442017 7.884504 2.248079 0.003163 0.003395 0.001256 3 262144.0 13.554499 30.464150 8.862205 13.518151 30.340742 8.853823 13.596681 30.538676 8.868200 0.004282 0.004607 0.001022 4 1048576.0 27.018191 63.879128 18.462470 26.969546 63.689016 18.440067 27.092187 64.125246 18.476459 0.002321 0.004308 0.001328 5 4194304.0 41.347635 109.755436 29.842077 41.139987 103.226614 28.489266 41.689568 116.095650 30.753634 0.005780 0.056834 0.043698 6 16777216.0 66.565688 172.773959 46.072265 66.323594 172.434798 46.043690 66.741518 173.189538 46.121663 0.002899 0.002157 0.000737 7 67108864.0 73.298160 193.933828 49.872075 73.071499 193.687548 49.631154 73.524945 194.266189 50.059130 0.002932 0.001379 0.003740

AndreyPavlenko · 2025-08-18T19:11:22Z

The benchmarks failed with OOME, but I see the same issue on the main branch https://github.com/intel/intel-xpu-backend-for-triton/actions/runs/17038905956/job/48297736887 .

whitneywhtsang · 2025-08-19T00:24:25Z

The benchmarks failed with OOME, but I see the same issue on the main branch https://github.com/intel/intel-xpu-backend-for-triton/actions/runs/17038905956/job/48297736887 .

It should be fixed by #4913.

…or up/down casts

AndreyPavlenko force-pushed the AndreyPavlenko/bench-fp8 branch 2 times, most recently from b5381cf to 73136d9 Compare July 29, 2025 13:08

AndreyPavlenko requested review from pbchekin and LiyangLingIntel July 29, 2025 13:10

LiyangLingIntel requested a review from whitneywhtsang July 30, 2025 00:52

LiyangLingIntel requested changes Jul 30, 2025

View reviewed changes

AndreyPavlenko force-pushed the AndreyPavlenko/bench-fp8 branch 4 times, most recently from 0b17de9 to 4cc1041 Compare July 31, 2025 12:41

AndreyPavlenko mentioned this pull request Aug 12, 2025

A tracking utility for gathering the compile and/or runtime time, size, profiling and other statistics #4777

Open

AndreyPavlenko force-pushed the AndreyPavlenko/bench-fp8 branch from 4cc1041 to c1d4de6 Compare August 18, 2025 12:06

[benchmarks] Reworked the conversion benchmark and added more tests f…

a1688f8

…or up/down casts

AndreyPavlenko force-pushed the AndreyPavlenko/bench-fp8 branch from c1d4de6 to a1688f8 Compare August 19, 2025 11:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[benchmarks] Reworked the conversion benchmark and added more tests for up/down casts #4800

[benchmarks] Reworked the conversion benchmark and added more tests for up/down casts #4800

Uh oh!

AndreyPavlenko commented Jul 29, 2025 •

edited

Loading

Uh oh!

LiyangLingIntel Jul 30, 2025

Uh oh!

AndreyPavlenko Jul 30, 2025

Uh oh!

LiyangLingIntel Jul 30, 2025

Uh oh!

AndreyPavlenko Jul 30, 2025

Uh oh!

LiyangLingIntel Jul 30, 2025

Uh oh!

AndreyPavlenko Jul 30, 2025

Uh oh!

LiyangLingIntel Jul 30, 2025

Uh oh!

AndreyPavlenko Jul 30, 2025

Uh oh!

LiyangLingIntel Jul 30, 2025

Uh oh!

AndreyPavlenko Jul 30, 2025 •

edited

Loading

Uh oh!

AndreyPavlenko commented Aug 18, 2025

Uh oh!

whitneywhtsang commented Aug 19, 2025

Uh oh!

Uh oh!

	def get_benchmark(
	providers_filter: Optional[list[str]] = None,
	transpose_a=False,
	transpose_b=False,
	matmul_kernel=matmul_kernel_with_block_pointers,
	matmul_kernel_batched=matmul_kernel_with_block_pointers_batched,
	plot_name='matmul-performance',
	):
	"""
	Returns a Mark object containing a Benchmark object constructed at runtime and parameterized by the provided option values.
	The benchmark can then be executed by calling the :code:`.run` method on the return value.
	"""
	supported_providers = {
	'triton': 'Triton',
	'onednn': 'OneDNN',
	}
	# use_cutlass
	if not (transpose_a or transpose_b):
	if torch.xpu.get_device_name() != 'Intel(R) Arc(TM) Graphics':
	# FIXME: enable cutlass on LNL
	supported_providers['cutlass'] = 'CUTLASS'
	providers = benchmark_suite.filter_providers(supported_providers, providers_filter)

	# Benchmark Performance
	# pylint: disable=too-many-branches
	@benchmark_suite.perf_report(
	benchmark_suite.Benchmark(
	# argument names to use as an x-axis for the plot
	x_names=['B', 'M', 'N', 'K'],
	# different possible values for `x_name`
	x_vals=X_VALS,
	line_arg='provider',
	# argument name whose value corresponds to a different line in the plot
	# possible values for `line_arg``
	line_vals=list(providers.keys()),
	# label name for the lines
	line_names=list(providers.values()),
	# line styles
	styles=[('green', '-'), ('green', '--'), ('blue', '-'), ('blue', '--')],
	ylabel=['GB/s', 'TFlops'], # label name for the y-axis
	plot_name=plot_name,
	# name for the plot. Used also as a file name for saving the plot.
	args={},
	))
	def benchmark(B, M, N, K, provider):

[benchmarks] Reworked the conversion benchmark and added more tests for up/down casts #4800

Are you sure you want to change the base?

[benchmarks] Reworked the conversion benchmark and added more tests for up/down casts #4800

Uh oh!

Conversation

AndreyPavlenko commented Jul 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

AndreyPavlenko Jul 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

AndreyPavlenko commented Aug 18, 2025

Uh oh!

whitneywhtsang commented Aug 19, 2025

Uh oh!

Uh oh!

AndreyPavlenko commented Jul 29, 2025 •

edited

Loading

AndreyPavlenko Jul 30, 2025 •

edited

Loading